Longest Common Subsequence in k Length Substrings
نویسندگان
چکیده
In this paper we define a new problem, motivated by computational biology, LCSk aiming at finding the maximal number of k length substrings, matching in both input strings while preserving their order of appearance. The traditional LCS definition is a special case of our problem, where k = 1. We provide an algorithm, solving the general case in O(n) time, where n is the length of the input strings, equaling the time required for the special case of k = 1. The space requirement of the algorithm is O(kn). We also define a complementary EDk distance measure and show that EDk(A,B) can be computed in O(nm) time and O(km) space, where m, n are the lengths of the input sequences A and B respectively.
منابع مشابه
Efficient algorithms for the longest common subsequence in $k$-length substrings
Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of...
متن کاملLongest Common Subsequence in at Least k Length Order-Isomorphic Substrings
We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least k length substrings. First, we show an O(mn) time algorithm for the problem which gives a better worst-case running time than existing algorithms, where m and n are lengths of the input strings. Furthermore, we mainly consider the LCS in at least k length ...
متن کاملSparse Dynamic Programming for Longest Common Subsequence from Fragments
Sparse Dynamic Programming has emerged as an essential tool for the design of efficient algorithms for optimization problems coming from such diverse areas as computer science, computational biology, and speech recognition. We provide a new sparse dynamic programming technique that extends the Hunt–Szymanski paradigm for the computation of the longest common subsequence (LCS) and apply it to so...
متن کاملA BSP/CGM Algorithm for the All-Substrings Longest Common Subsequence Problem
Given two strings X and Y of lengths m and n, respectively, the all-substrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y . The sequential algorithm takes O(mn) time and O(n) space. We present a parallel algorithm for ALCS on a coarse-grained multicomputer (BSP/CGM) model with p < p m processors that takes O(mn=p) time an...
متن کاملSubsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms
We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s ∈ Σ and a nonnegative integer k ≤ n, how many distinct subsequences of length k does s contain? A previous result by Chase st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013